-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test EncodedSring#to_s for undefined conversion / invalid byte sequence #134
Conversation
728cf6d
to
ff988ae
Compare
@@ -21,6 +21,47 @@ module RSpec::Support | |||
end | |||
end | |||
|
|||
describe '#to_s' do | |||
if RUBY_VERSION == '1.9.2' && RSpec::Support::Ruby.mri? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protip(tm) you can shorten this to Ruby.mri?
given you're in the Support
module already, however you shouldn't limit it to Ruby version 1.9.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'm switching back and forth between the two, so forgot my context :)
It actually behaves differently on 1.9.2.. https://travis-ci.org/rspec/rspec-support/jobs/41566833#L113
1) RSpec::Support::EncodedString#to_s does nothing to an invalid byte sequence
Failure/Error: expect(resulting_string.to_s).to eq("\xEF hi I am not going to work")
expected: "\xEF hi I am not going to work"
got: "\xEF hi I am not going to work"
b70f78e
to
5325c3f
Compare
expect(resulting_string.to_s).to eq("? hi I am not going to work") | ||
end | ||
|
||
it 'replaces all characters with either a ? or a unicode ?' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc string doesn't make sense to me...it seems to claim that EncodedString#to_s
returns a string containing all question marks no matter what the original string is. Can you wrap this in a descriptive context that describes when this behavior occurs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It surprised me as well. I just made the test pass from the actual output based on the current behavior. split
to_s
and <<
don't result in the same type of string. maybe it should be looked at. Also see @JonRowe in rspec/rspec-core#1760 (comment) and my followup
I thought adding the describe '#to_s' do
was sufficient. Otherwise it requires a bit more investigation to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought adding the describe '#to_s' do was sufficient. Otherwise it requires a bit more investigation to understand
Well, there are other examples in this group where calling #to_s
does not result in a string of all question marks, so clearly it 'replaces all characters with either a ? or a unicode ?'
isn't the behavior you always see when #to_s
is called. Are UTF-16LE
and IBM737
completely incompatible encodings? Is that what causes that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a too-long break I'm back looking to finish the PRs. I'll update the context to make the tests clearer and see if I can clarify the reasons for different encoding behavior. I think that would resolve the blocker.
5325c3f
to
c8b0521
Compare
end | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think of this? I kind of like it but I've never seen it before. I found it while looking through the Ruby encoding / transcoding code and tests.
@myronmarston going to sleep now, will answer questions in the morning.. this has been a bit of a slog.. in some cases, to answer your question, it may be in a commit message.. some of your questions are about how we want it to behave (replace with '?' vs. something else) which I don't have an opinion on |
I do intend to rebase and clean this up.. esp now that the tests all passed. |
@@ -47,8 +47,6 @@ def diff_as_string(actual, expected) | |||
finalize_output(output, hunks.last.diff(format_type).to_s) if hunks.last | |||
|
|||
color_diff output | |||
rescue Encoding::CompatibilityError | |||
handle_encoding_errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being removed because of https://github.com/rspec/rspec-expectations/pull/220/files#diff-9533f5f156a38a3307ecfc610d2282d7R47 ?
I ask because rspec-mocks uses the differ in RSpec 2.2....but it doesn't rescue this error. It makes me wonder if we should move this rescue back into here and/or add a similar rescue to rspec-mocks.
@JonRowe, give that you authored rspec/rspec-expectations#220, what do you think should be done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm noticing that rspec-expectations doesn't rescue this error anymore, either...so why is it safe to remove, @bf4? (Apologies if your commit messages explain but at 23 commits it's a lot to look through).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading through the rest of the diff it looks like this error isn't possible anymore since EncodedString
handles so many more cases internally...is that right, @bf4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@myronmarston That's my experience thus far. I have been unable to cause the code to fail with that exception, which is why I tracked down where it came from. Specifically, as I noted in the commit message bf4@96584ba
The Differ no longer catches CompatibilityErrors
This was introduced in
https://github.com/rspec/rspec-expectations/pull/220/files#diff-9533f5f156a38a3307ecfc610d2282d7R47
but is no longer raised either by the the current test code, the
original test code
@expected="Tu avec carté {count} itém has".encode('UTF-16LE')
@actual="Tu avec carte {count} item has".encode('UTF-16LE')
or any other variations I've tried
Since I'm moving all the encoding logic into EncodedString, and this
doesn't appear to do anything any more, I am proposing to remove it.
As I commented elsewhere, I want to confirm this isn't still a problem on 1.9.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how notoriously difficult it is to actually craft an invalidly encoded string when you run a conventually encoded environment I'm uncomfortable with this FYI
Thanks @bf4, this is looking good. I think I understand it now. Squashing commits during a rebase will definitely help keep the history here more understandable so please do that when you're ready. |
@myronmarston I think I'm about ready to rebase the commits. I'll push a few first addressing any remaining issues. I was following the thoughtbot guide where when a PR is being discussed, to continue adding commits so that changes are easy to see, but then rebase when done to tell a better story. I started doing that because I was making so many changes, I wanted a better history of what I was trying, rather than what I ended up with :) |
actual_bytes = resulting_string.each_byte.to_a | ||
it 'uses the default external encoding when the two strings have incompatible encodings' do | ||
utf8_string = build_encoded_string("Tu avec carte {count} item has\n", "UTF-8") | ||
utf8_incompatible_string = "Tu avec carté {count} itém has\n".encode('UTF-16LE') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests with the 'Tu avec' came from the differ, since I thought the é
was a good scenario we weren't testing here
def diff(actual, expected) | ||
diff = "" | ||
diff = EMPTY_DIFF.dup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was ""
insufficient?
Also, is the dup
just to guard against mutations (to ensure EMPTY_DIFF
isn't mutated)? If there aren't any mutations maybe you could freeze the constant instead and use it w/o duping for fewer allocations.
Hey @bf4, I really appreciate the effort you're putting into this, but I also feel like this PR keeps shifting significantly every time I take a look at it. As lead RSpec maintainer I think it's important that I understand any code I merge (as I'll need to be able to refactor it and fix bugs in it in the future) but there's a lot going on here that I don't understand and every time I look at it it seems like there's new stuff I don't understand and I still don't understand the old stuff :(. At this point it seems like here are many concerns being addressed all in one PR:
Having all these concerns is making it much harder for me to understand what the actual affect of this PR is and also to ensure we're not losing coverage in all the spec changes. I hate making more work for you (especially since you've done so much work on this PR already...) but do you think you could open up a few smaller PRs that address a single aspect each? That would make your changes much easier to review and reason about and hopefully we could get the stuff that's not still an open question merged quickly and then be left with whatever the WIP stuff is in a much smaller PR. Ideally I'd like to keep spec refactorings distinct from implementation refactorings and bug fixes. |
What does that have to do with your changes here? (I don't see any relation...) Or are you just recommending we change how rspec-support handles shelling out? If it's the latter, please open a separate PR for that -- mixing concerns all in one PR is making my head spin! |
@myronmarston thanks for your review. I think you're right that this can now be broken up into smaller PRs. As you know, that happens sometimes that some changes expose others :) I know the scope has changed a bit, but that's mostly because I've been trying to figure out how to 1) cover the different Encoding exceptions and 2) reliably test them. I see a few types of comments
First, it checks if the strings are the same encoding. If not, it's a failure. If we did a straight-up bytes comparison, we would just get a diff of numbers which is pretty undreadable. So, what I try to do, is encode all the codepoints I can, then compare the characters in the strings until there is a difference, then output that. Rescuing the ArgumentError in safe_codepoints is if the string has an invalid byte sequence, it raises that on string.codepoints, so I fall back to just bytes. There's probably other ways to do it :) |
From what I can tell, it looks like that didn't actually fix the JRuby build? According to the travis issue there was an RVM bug that caused jruby-head to get installed as jruby so we've just been planning to wait until the fix is rolled out. If there was a simple fix like that I'd be in favor but I'd want it made in rspec-dev and pushed out to all rspec repos.
I think that
Gotcha. Thanks for explaining that. I think the logic you've written could benefit from being refactored into a custom matcher, that uses a simple string equality check for |
JRUBY_OPTS just a workaround so i can use ci. We do already modify LC_ not sure what is most worth my time. Maybe in a refactor? Those Byte matcher: pseudo code would help. I haven't written many and it |
Let me know what you'd like for next steps here. What's remaining after #151 and #152
And possibly misc Windows/JRuby fixes mixed in I think we can leave out the Encoding internal/external/etc specs |
Extracting EncodedString specs from #134
Map string char with invalid encoding to '?' Format identical string expectation to read easier Refs: - rspec/rspec-core#1760 - via rspec#134
Map string char with invalid encoding to '?' Format identical string expectation to read easier Refs: - rspec/rspec-core#1760 - via rspec#134
Map string char with invalid encoding to '?' Format identical string expectation to read easier Refs: - rspec/rspec-core#1760 - via rspec#134 Set to pending for JRuby, opened issue jruby/jruby#2580 that JRuby is the only Ruby that returns "\x80" in place of "?"
Looks like the big goals of this 1 through 4 are done. The only things left include
And with that, I think we can close this PR. |
Related to rspec/rspec-core#1760
UPDATE 2015-02-8: Split into other PRs: